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Microarray experiments generate a high data volume. However, often due to financial or exper- 
imental considerations, e.g. lack of sample, there is little or no replication of the experiments or 
hybridizations. These factors combined with the intrinsic variability associated with the measure- 
ment of gene expression can result in an unsatisfactory detection rate of differential gene expression 
(DGE). Our motivation was to provide an easy to use measure of the success rate of DGE detection 
that could find routine use in the design of microarray experiments or in post-experiment assessment. 

In this study, we address the problem of both random errors and systematic biases in microarray 
experimentation. We propose a mathematical model for the measured data in microarray exper- 
iments and on the basis of this model present a t-based statistical procedure to determine DGE. 
We have derived a formula to determine the success rate of DGE detection that takes into account 
the number of microarrays, the number of genes, the magnitude of DGE, and the variance from 
biological and technical sources. The formula and look-up tables based on the formula, can be used 
to assist in the design of microarray experiments. We also propose an ad hoc method for estimating 
the fraction of non-differentially expressed genes within a set of genes being tested. This will help 
to increase the power of DGE detection. 

The functions to calculate the success rate of DGE detection have been implemented as a Java 
application, which is accessible at 

http:/ /www. le.ac.uk/mrctox/microarray_lab/Microarray_Softwares/Microarray_Softwares. htm 
Supplementary information at ftp://alcyone.mrc. le.ac.uk/ Pub/twgl/BioInf03-0661suppl.pdf 



I. INTRODUCTION 

Whole genome sequencing and the related develop- 
ment of microarrays have given researchers unprece- 
dented power to simultaneously determine the expres- 
sions of many thousands of genes pj. However, a sta- 
tistical challenge facing microarray analysis is to identify 
differential gene expression (DGE) with a high rate of 
success and low rate of false positives. Such a method is 
required because of the number of gene expressions being 
simultaneously determined, and the variation associated 
with each can give rise to an unacceptably large number 
of false positives or low successful detection rate. The 
variations associated with gene expression experiments 
can be categorized into two sets. First, there are inter- 
individual differences between members of a population, 
thus sufficient biological individuals should be included 
in the experiments in order to account for the biological 
variation. Second, there are always technical errors aris- 
ing from the experimental procedure, which may be fur- 
ther sub-categorized into random errors and systematic 
biases. Unlike random errors, which can be reduced by 
making multiple measurements, systematic biases cannot 
be reduced by simply doing more measurements, correct 
experimental designs must be employed to negate them. 

One of the most serious sources of systematic bias in 
microarray experiments (for dual label hybridizations) 
is the imbalance in the measured fluorescence intensi- 
ties between the two fluorescent channels C__l EH - A 
manifestation of this systematic bias is that when two 
identical mRNA samples are labelled with different fluo- 
rescent dyes and hybridized to the same microarray slide, 
one channel has a higher average fluorescence level than 



the other. To complicate matters further the imbalance 
of the two channels is not uniform, but varies from fea- 
ture to feature. A feature is the area of fluorescence on 
a microarray corresponding to one gene and where hy- 
bridization of the labelled nucleic acids derived from this 
gene has taken place 5] . To correct the labelling dye im- 
balance, different methods of normalizing the microar- 
ray data by adjustingthe measured fluorescence levels 
have been proposed 0, IE 0- These methods can be 
roughly classified into two categories. First, global nor- 
malization, in which the fluorescence levels of all the fea- 
tures are globally (uniformly) adjusted (by shifting or 
re-scaling) to fulfill some assumptions about the relative 
expressions of the genes, e.g. most genes are not differ- 
entially expressed between the two samples 6J. However, 
because global normalization adjusts the fluorescence lev- 
els of all features uniformly, it cannot account for the 
different magnitudes of imbalances from feature to fea- 
ture, so a second type of normalization method is often 
employed to take account of this variation. This normal- 
ization method adjusts the fluorescence level according 
to some local properties of the feature spot, e.g. the 
overall brightness of the spot Q, and usually involves 
fitting the measured data with a non-linear smoothed 
curve. The fluorescence level is then adjusted according 
the smoothed curve, which is assumed to describe the 
dependence of the imbalance on spot fluorescence inten- 
sity. But the fluorescence imbalances between the two 
channels arc more complicated than can be described by 
a smoothed curve. Due to irregular intrinsic fluorescence 
of the microarray slide and possibly some gene-specific 
effect [1, __|; it is unlikely that the fluorescence imbal- 
ance can be corrected for all features by the intensity- 
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dependent normalization. To correct the fluorescence im- 
balance of each feature a simple method is to reverse the 
labelling dyes when hybridizing some microarrays. Kerr 
et al first proposed an ANOVA model for microarray 
data, and showed that ANOVA methods can be used to 
normalize the data and estimate real changes in gene ex- 
pression. Taking biological variations into account, Dob- 
bin et al |2| have addressed the problem of statistical 
design of reverse dye microarrays to minimize variance 
with a given number of microarray slides. We have taken 
the analysis further to address the problem of identify- 
ing DGEs with a desired detection power and controlled 
number of false positives. A model and statistical testing 
procedure are presented in the following sections to assist 
research workers in the selection of an appropriate num- 
ber of microarrays for an experiment in order to achieve 
the desired detection power, or alternatively in assessing 
the detection power achievable when the experiment has 
been done. 



II. THE MODEL 

The experimental situation analyzed here is one where 
there are two sample groups. One of the groups might 
have been subjected to an event such as chemical expo- 
sure, the other being a suitable control, or the two groups 
might be normal and tumor tissues or different organs. 
For convenience the two groups will be designated as the 
treated and control groups. 

In cells, the amount of mRNA corresponding to a par- 
ticular gene is taken to correspond to the expression level 
of that gene. A microarray is a means to translate the 
level of mRNA for many genes, which cannot be mea- 
sured directly, into fluorescence that can be measured 
directly. The model presented in this paper is designed 
for experiments where each gene is spotted only once on 
each microarray, and each individual sample is hybridized 
only once using one microarray. For the purpose of intro- 
duction consider one single feature spot on the microar- 
ray. We assume that the log-intensity fluorescence of 
this feature takes additive contributions from the follow- 
ing sources: the amount of corresponding mRNA in the 
biological sample, an effect from the quality of the fea- 
ture spot, an effect from the labelling fluorochrome (in- 
cluding the efficiency of labelling with the fluorochrome, 
and possible pre-existing intrinsic fluorescence in favor of 
this fluorochrome), and the random measurement error. 
Therefore we have the following model 

Gv,i,s,c — ^v.i A s -\- D c ^v,i,s,c: (1) 

where G v ^ tS ^ c is the log-intensity (base 2 logarithms are 
utilized throughout this paper) of fluorescence of the fea- 
ture spot; 

I Vi i is the expression level of the gene in the ith indi- 
vidual sample of group v (v = t for the treated group, or 
v = c for the control group, and i = 1 • • • n where n is the 
number of individuals in each group). I v j is assumed to 



be independently and normally distributed with a mean 
E v and a variance of, denoted by I Vi i ~ N(E v ,al); 

A s is the effect of feature spot quality, which is assumed 
to be fixed for microarray slide s and independent of 
fluorescent label used; 

D c is the effect of the fluorescent label c (c = g for 
green dye, and c = r for red dye), which is assumed to be 
fixed with label c and independent of microarray slide; 

£v,i,s,c is the random error term which is assumed to 
be independently and normally distributed with a mean 
and a variance of , denoted by e v ,i,s,c ~ N(Q, of). 

Note that for each of the features on the microarray 
the log-intensity is described in the same form Eq.Q. 
Although the equations are in the same form for each 
feature the actual values of E v , cr%, A s , D c , of will be 
feature dependent. 

E v is the mean expression level of the gene in the sam- 
ple group v, so in comparing a gene's expression between 
the treated and control groups, the quantity of interest 
is E t — E c , the magnitude of differential expression. The 
effects of feature spot quality A s and fluorescent dye D c 
are not of interest and therefore need to be eliminated by 
a suitable experimental design. 



III. EXPERIMENTAL SETUP 

Let's introduce the notation (cj,U) to represent a mi- 
croarray as a result of the following hybridization: indi- 
vidual sample Cj labelled with green dye and individual 
sample ti labelled with red dye. Here c and t indicate the 
sample group while the subscripts index different indi- 
viduals in each group. As a convention, the first sample 
in the parenthesis is always labelled with green dye and 
the second with red dye. 

Consider the microarray (cj,ti). Here an individual j 
is taken from the control group (v = c) and an individual 
i from the treated group (v — t). RNA is extracted from 
both and converted to labelled cDNA using fluorescent 
labels green and red respectively. These are then simulta- 
neously hybridized to the microarray a. This method of 
labelling (control sample with green and treated sample 
with red) is referred to as forward labelling. As a result 
of this experiment we can derive from Eq. Q 

Gc,j,a,g Ic.j ~t~ ^« ~t~ ~~\~~ £c,j,a,gi 
Gt,i,a,r = It,i + A a + D r + £t,i,a,r- 

The difference F a between the two fluorescence log- 
intensities is therefore 

Eq. Gt,i,a,r Gc,j } a,g — ^t,i *c,j 

~\~E) r Dg -\- Et.i.a.r £c,j,a,g: (2) 

and F a is normally distributed with an expected value 
(mean) E t — E c + D r — D g and a variance of + of + 2of . 
Note that taking the difference of G c j. a . g and Gt,i. a .r 
causes the spot effect A a to be cancelled out and it does 
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not therefore contribute to F a . However, there is still the 
labelling fluor effect D r — D g to consider. To eliminate 
this effect microarrays with reverse labelling are required. 

Consider the microarray (ti',Cj/), where another two 
individuals, i 1 from the treated and j' from the control 
groups are hybridized to another microarray b. On this 
occasion the individual from the control group j' is la- 
belled with red and the individual from the treated group 
i' with green. This method of labelling (control sample 
with red and treated sample with green) is referred to 
as reverse labelling. From this microarray b we get: 

Gt,i',b,g = h,V + Ab + D g + et,i',b,g, 
Gc,j\b,r — Icj' Ab ~\~ Df -\- e c _j' b,r- 

The difference of the two log-intensities is 



Bb — G 



G, 



c,j f ,b,r 



= 1 



+D q - D r + e t ,i>b,. 



W J-cj' 



(3) 



and Bb is normally distributed with an expected value 
E t — E c + D g — D r and a variance of + of + 2 of . 

The quantity F a (or Bb) is the difference of two log- 
intensities and is therefore equivalent to the logarithm 
of the ratio of two intensities. Thus F a (or Bb) is often 
called the log-ratio of a gene. The variance of the log-ratio 
of a gene, a\ = a^+a^ + 2 of , is the sum of the biological 
variance of the control of, the treated of, and the mea- 
surement variances associated with them 2of . Hereafter 
o~\ is referred to as the total variance of the log-ratio of 
the gene. 

From Eqs.© and © it is clear that by combining 
measurements from both forward and reverse labelled mi- 
croarrays, it is possible to eliminate the fluorescent label 
bias. One simple way of doing this is to take the average 
of Eqs.© and ©■ The expected value of this average is 
then E t — E c , which is the quantity of interest. The above 
arguments therefore show that to eliminate the spot ef- 
fect A s , we need to hybridize the control and treated 
samples onto the same microarray slide. To cancel out 
the fluorescent label effect D c we need to do both forward 
labelled and reverse labelled microarrays. A general for- 
malism is presented in the following sections to deal with 
situations where the number of forward labelled microar- 
rays and the number of reverse labelled microarrays are 
not necessarily the same. 

We will consider the following experiment: 

( c l; tl), (C2, £2), ' ' ' , (Crif-l, trif-l), (Cn f , t nf ) 
3 Cn/ + 1 ) ) \Pn f +2 ) C71 ^ +2) ) ' ' ' ) (t n j -i r n r , Cn/+n r ) ■ 

In this experiment there are in total nf + n r microar- 
rays, 71/ of them are forward labelled, and the rest n r 
are reverse labelled. In relation to similar studies by 
other authors^, 0, ^| using replicated microarrays, 
this study focuses on a special case of microarray ex- 
periment designs, i.e., direct comparison between two 
groups with biological but no technical replicates in each 



group. It is a special case of the balanced block de- 
sign as described by 01- They have showed that the 
balanced block design is the most efficient experimental 
setup when comparing two classes with a given number of 
microarrays. The limitation of this experimental setup, 
as Dobbin and Simon pointed out for the balanced block 
design, is that it is not suitable for clustering analysis. 



IV. DETECTING DGEs 

A. Hypothesis test 

For each gene printed on the microarrays, we want 
perform a statistical test to determine whether this gene 
is differentially expressed to a significant degree in the 
treated group compared to the control group. The null 
hypothesis is that the gene has the same expression level 
in the two groups: 

Null hypothesis H : E c = E t (4) 
Alternative hypothesis Hi : E c ^ E t (5) 

From each of the nf forward labelled microarrays an 
intra-array log-ratio Fi between the treated sample and 
the control sample is obtained, and similarly from each of 
the n r reverse labelled microarrays a log-ratio Bj. Each 
Fi has an expected value E t — E c + D r — D g , so the 
average F = Y^iL\Fil n j has the same expected value. 
Similarly the average B — X)j=i Bj/n r has an expected 
value E t — E c — D r + D g . Averaging F and B gives 



F + B 



1 «/ 1 n r 

J i—1 



(6) 



which will have an expected value E t — E c , so R is an 
unbiased estimator of our quantity of interest. Also R is 
normally distributed with a variance 



1 



1 



(7) 



When the total number of microarrays nf + n r is fixed, 
the variance of R is minimized at nf = n r , so whenever 
possible, equal numbers of forward and reverse labelled 
microarrays should be combined. The variances of, of , 
and of are unknowns, but fortunately there is no need to 
estimate them individually. For the purposes of identify- 
ing differential gene expression, estimating o~\ as a whole 
is sufficient and a\ can be estimated using its un-biascd 
estimator 



1 



n f + n r 



»f 



Y^i^-Ff+Y^^-Bf 

i=l J=l 



(8) 



and (nf + n r — 2)s 2 ja\ will follow the x 2 distribution 
with nf + n r — 2 degrees of freedom, independent of F 
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and B ljj , thus s 2 is independent of R. Note that in order 
to estimate E t — E c and a\ properly it is necessary that 
rij > 1, n r > 1, and rif + n r > 2. In other words there 
must be at least one forward and one reverse labelled 
microarray, and at least three microarrays in total. It is 
then apparent that 



t = 



R - {Et — E c ) 



(9) 



is distributed as the Student's t distribution with nj + 
n r — 2 degrees of freedom. In testing the null hypothesis 
Eq.|@}, we insert E t — E c into Eq.© and thus our test 
statistic to is defined as, 



to = 



R 



(10) 



Note that there is now no unknown quantity in Eq. fllUI . 
Under the null hypothesis that E t — E c , to follows the 
Student's distribution with n/+n r — 2 degrees of freedom. 
Based on the value of to the p-value of the test can be 
calculated. If the p-value calculated is larger than some 
pre-set threshold Pth , the null hypothesis is accepted that 
the gene has the same level of expression in both the 
control and treated groups. If the calculated p-value is 
smaller than the threshold Pth, it is declared that the test 
for this gene is positive, in the sense that its expression 
level in the treated group is different from that in the 
control group. Then depending on the sign of to the 
gene is either designated as up (to > 0) or down regulated 
{to < 0). 



B. Setting the threshold p-value 

A t test is performed for each gene, which is then de- 
clared as differentially expressed, or not, according to the 
above criteria. By adjusting the value of threshold Pth 
a control can be exerted on the number of false DGE 
calls made. By definition, p-value is the probability of 
observing a value of the statistic as extreme or more ex- 
treme than the observed value, under the condition that 
the null hypothesis is true. For each gene whose null hy- 
pothesis is true (we call each such gene a null gene), its 
p-value is uniformly distributed in (0,1). Therefore the 
probability that a null gene's p-value is smaller than P t h 
is just P t h- Suppose that in a total number N genes, 
Nq are null genes. When every gene on the microarray 
is tested, the number of false DGE calls Of p will has an 
expected value N P t h- So if one decides to tolerate an 
expected number Nf p false DGEs the threshold p-value 
should be set at P t h = Nf p /N . However, in reality only 
N is known and not Nq and therefore, it is necessary to 
make an estimation of No or Nq/N. Some methods for 
estimating Nq/N are discussed in Sec. IV Bl 



Once the threshold value Pth is set, the ability to detect 
genuine DGE, i.e. a gene with E t ^ E c , depends on the 
following factors: the magnitude of differential expression 
E t — E c , the total variance in one microarray experiment 
erf., and the number of forward and reverse labelled mi- 
croarrays. Among these factors, the ones over which ex- 
perimental control is exercised are nf and n r . In general 
the larger nf and n r , the more powerful will be the sta- 
tistical testing. The key question is therefore, how many 
forward and reverse labelled microarrays are required in 
order to achieve a desired power of DGE detection with 
control on the number of false DGE calls? Based on the 
standard normal Z test, several authors have presented 
results on calculating the number of microarrays needed 
to achieve given statistical power while controlling false 
positive rate 0, El ■ These results would be applicable if 
we knew u\ for each gene. In reality though the variances 
cannot be assumed known, and more often than not, the 
number of microarrays used to estimate the variances 
is rather small. It is therefore necessary to use t-based 
test rather than the standard normal test. Other au- 
thors have also presented approximate formulas [TtI ITsf 
for calculating the power of the traditional two-sample 
t test with equal variance. In this paper we present an 
exact formula for calculating the power of the t-based 
statistical test developed here. 



C. Determination of the threshold t-value 

When the numbers of forward and reverse labelled mi- 
croarrays are given, setting P t h is equivalent to setting a 
threshold, say |£|, for the statistics to defined in Ea. (|10|l . 
With this threshold t-value, our criteria for claiming a 
DGE is as follows: If to > \£,\, the gene is claimed as 
up-regulated {E t — E c > 0); if to < — it is claimed as 
down-regulated {E t — E c < 0) . So the rate at which false 
positive claims are made is 

r- ISI r°° 

Pth = / {t )dt Q + / Pn f +n r -2{to)dt 

J-oo J\£\ 

= 2 / Pn f +n r -2{to)dt = 2T n/+ „ r _ 2 (- 1) 

J — OG 

where p r {x) is the probability density function (PDF) of 
the Student's distribution with r degrees of freedom, and 
T r {.) is the cumulative probability distribution function 
(CDF) for the Student's t distribution. It is therefore 
apparent that the threshold t-value |£| can be obtained 
by solving the equation 2T Il/+Jlr _2(~ = Pth with a 
given false positive rate Pth- 



D. Successful detection rate 

The successful detection rate is the rate at which 
DGE is correctly identified (either up-regulated or down- 
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regulated). If a gene has E t — E c = \i > 0, the success- 
ful detection rate for this gene is the probability that 
to > |£| is observed. On the other hand, if a gene has 
Et — E c = n < 0, the successful detection rate equals the 
probability that t < — 1£| is observed. It can be shown 
(see supplementary information I) that in both cases, the 
rate at which the genes behavior is correctly identified, 
i.e. (J, > or fj, < 0, can be described by the following 
equation 



S n f ,n r , — , £ 



Pn f +n r -2(Y) X 



-lei 



Y 



n f 



•2(H 



nf + n r 



dy, (12) 



where p r {Y) is the PDF for the \ 2 distribution with r 
degrees of freedom, and $(.) is the CDF for the standard 
normal distribution. 

Therefore the successful detection rate S is a function 
of rif, n r , \h\/<jt, and where |£| can be obtained by 
solving Ea. (|ll|) at a given P t h- Eventually, S is a function 
of P t h, nf, n r , and \fi\/a T - 



E. Usage of the S function 

We have implemented the calculation of the 5* function 
as a Java application, which is accessible through the 
URL given in the abstract. Two look-up tables also are 
provided in the supplementary for some typical results of 
S for quick reference. Experiment designers can use these 
to find the value of S at given parameters rif, n r , |/x|/ctt, 
and Pth, thus get some general idea of what percentage of 
truly DGEs can be detected by their experimental design. 

The applicability of the S function can be seen from 
two perspectives. First, for the user who has not car- 
ried out any microarray experiments on their system 
before, the total variances (erf.) will be completely un- 
known. In this situation the S function can serve as a 
post-experiment assessment to inform the user of the de- 
tection rate in their experiment based on the observed 
values of R and s 2 from the measurements. For exam- 
ple, 3 forward and 3 reverse labelled microarrays, with 
5000 genes printed on each microarray, were used in a 
experiment. The tolerance for false positives is set at 
Nf p — 2, and for simplicity the threshold p- value is set 
as Pth = 2/5000. If most genes have an s 2 around 1, 
then the typical value of erf for the set of genes is 1. We 
can now ask: for genes with two-fold differential expres- 
sion and typical variance, what percentage of them can 
be correctly detected by this experiment? Remember- 
ing that a two-fold differential expression corresponds to 
/j = Et — E c = 1 or /j = E t — E c = —1, we have = 1 
and ctt = 1. Using the S calculator or the look-up ta- 
bles (Supplementary Table I) we find that the successful 
detection rate for rif = 3, n r — 3, Pth — 2/5000, and 
\h\/<jt = 1 is 9.08 x 10~ 3 , which means that in this ex- 
periment only 0.908% of genes with two-fold DGE and 



with typical variance 1 can be detected, the remaining 
99% are missed. If the same question was asked about 
genes with four-fold DGE and one decides to tolerate 
Nfp — 8 false positive claims and the threshold is set 
at Pth — 8/5000, then the successful detection rate for 
P th = 8/5000, n f = 3, n r = 3, and \(J.\/a T = 2 is 0.217, 
which means 21.7% of them are successfully detected. If 
the detection rate is unsatisfactory, then more forward 
and reverse microarray datasets need to be added. 

Second, if there is some general knowledge of total vari- 
ance from previous experiments or other sources, then a 
target for the detection rate can be set. In this case, the 
S function will assist in the determination of how many 
forward and reverse microarrays are required in the ex- 
periment. For example, if from previous experience we 
know that the typical value of the total variance for the 
set of genes under consideration is erf = 0.25, which gives 
ctt = 0.5; A microarray experiment is now designed to 
identify DGEs between the treated and the control with 
a tolerance of 8 false positive claims out of 5000 genes 
being tested with P t h = 8/5000 for simplicity; The pre- 
set target is that after this experiment no less than 60% 
of genes with two-fold DGE and with typical variance 
should be detected; How many forward and reverse la- 
belled microarrays are needed? As before, two-fold DGE 
corresponds to = 1, so one has \h\/<tt — 2. Us- 
ing the look-up tables (Supplementary Table II, in the 
|/x|/ctt = 2 panel and P t h = 8/5000 column), one finds 
that the row n/ = n r = 4 gives a detection rate S = 0.605 
which is closest to meet the target. Therefore 4 forward 
and 4 reverse labelled microarrays are required in this 
experiment. 



V. CONTROLLING FALSE POSITIVES 
A. Procedures 

In this section, we explore further on how to effectively 
control false positives in a multiple test situation. Gener- 
ally speaking, all different multiple-testing methods even- 
tually amount to effectively setting a threshold p-value, 
and then rejecting all the null hypothesis with p-value 
below this threshold. For example, the classical Bonfer- 
roni multiple-testing procedure controls family- wise error 
rate at a by setting the threshold Pth = ce/N, where N 
is the total number of hypothesis tested. In this study, 
we aim to control the number false positives such that 
the expectation of Of p equals Nf P , our pre-set target. 
As discussed in Section TlV Bl to achieve this, we should 
set Pth = Nfp/ No, which requires an estimation of Nq or 
Nq/N, the fraction of null genes in the set. 

We present three procedures here for setting P t h to 
control false positives: 

Procedure A: Suppose we have made an estimation of 
N /N as c, then set P th = N fp /(cN). The method for 
calculating c will be discussed below. 
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Procedure B: Set P t h — Nf p /N. This can be seen as 
using c = 1 as the crudest estimation of N /N. 

Procedure C: Suppose genes are sorted by their ascend- 
ing p-values, so that p\ < pi < P3 < ■ • • < pn, where pi 
is the p-value for gene i. Set Pth = Pi*, where i* is the 
largest index satisfying pi[N — i + mm(i,Nf p )] < Nf p . 
This can be seen as estimating Nq/N by c = [N — i* + 
min(i*, Nf p )]/N. The idea behind this is that if gene i* 
and all genes indexed below it are to be declared DGEs, 
these genes should not contribute to the fraction of null 
genes. Thus this represents some improvement over the 
crudest estimation c = 1. 

We have performed simulations to compare the per- 
formances of the three procedures. Procedure A allows 
us to achieve the highest rate of DGE detection among 
the three, and the observed false positives Of p matches 
our preset target Nf p statistically. Procedure B does not 
estimate Nq/N effectively, and it is the most conserva- 
tive procedure. So Procedure A is recommended over 
C and B (See Supplementary for details on simulation 
procedures and data). 

Benjamini and Hochberg |l9j proposed the FDR ap- 
proach to control the false discovery rate (FDR) at q by 
setting = i*q/N, where i* is the largest index sat- 
isfying pi < iq/N. The false discovery rate was defined 
as the expectation of the ratio of false to total positives, 
i.e., q = E(Of p /i*). When the FDR procedure controls 
false discovery rate at q, the observed false discovery rate 
Of p /i* should have value around q, i.e., q w Of p /i*, 
which gives Pth — i*q/N w Of p /N. The expectation of 
the threshold p-value under the FDR procedure is there- 
fore E(P th ) w E(O fp /N) = N fp /N. It is thus clear that 
the FDR procedure of [fSj is on average equivalent to 
Procedure B in this section. 



B. Estimating N /N 

Pounds and Morris |2(J recently proposed the use of 
a beta-uniform mixture (BUM) function to approximate 
the distribution of p-values from a set of genes tested, and 
estimate the fraction of null genes in the set. Here we pro- 
pose another method to estimate Nq/N, which does not 
requires the BUM form of distribution of p-values. The 
aim was to achieve a more accurate estimation of the frac- 
tion of null genes. As in |2(il |. we wanted to extract a uni- 
form density from the observed distribution of p-values. 
To achieve this, the genes were first sorted by their as- 
cending p-values, so that pi < P2 < Pz < • • ■ < pn, where 
Pi is the p-value for gene i. Then an empirical cumulative 
distribution of p-values can be easily obtained by plot- 
ting i/N versus pi. The idea was to find a straight line 
tangent to the cumulative distribution curve with min- 
imum slope. Taking into account that the cumulative 
distribution curve is a non-decreasing function ending at 
the point (1.0, 1.0), the minimum slope was found as fol- 
lows. Each point (pi, i/N) on the cumulative distribution 
plot was connected with the ending point (1.0, 1.0) with 



a straight line, and the slope of the line calculated as 
Cj = (1.0 — i/N) / (1.0 — pi). Then the minimum of Cj at 
a given range of p-value, say Pi < Pi < P u , was found 

c mm = min(c 4 \ Pi < p l < P u ). (13) 

i 

Cmin can be used as our estimation of the fraction of null 
genes in the set. 

We have carried out simulations to test the perfor- 
mance of Eq.lJTS}, and found that it tends to underes- 
timate the true value of Nq/N. Instead, using median 
slope as the estimation of Nq/N gives more accurate re- 
sults than the minimum slope. We thus use the following 
equation to estimate the fraction of null genes 

Cmid = median(ci | Pi < p t < P u ). (14) 

In a recent paper plj . Storey and Tibshirani used a 
natural cubic spline to fit the data of Ci as a function 
of pi for a given range of p-values, then took the value 
of the spline at p = 1 as the estimation of Nq/N. We 
compared the Storey-Tibshirani method with Ea. (|14ll . 
an advantage of the latter is that it is computationally 
much simpler than the Storey-Tibshirani method. As can 
be seen from Table H| both our method and the Storey- 
Tibshirani method become more accurate as N and/or 
Nq/N increases, and in all the cases our method gives 
slightly better results, as indicated by the coefficient of 
variation. 

As for the values of Pi and P u , a practical guidance 
for choosing them is to set Pi a value between 0.4 and 
0.5, and P u between 0.9 and 0.95. In fact, Ea. l(Ti|l gives 
quite robust results with respect to changing the val- 
ues of Pi and P u within the recommended range. For 
a set of simulation tests with true null fraction 0.8, using 
(P U P U ) = (0.4,0.95) gives c mid = 0.800 ± 0.023, while 
using (Pt,P u ) = (0.5, 0.9) gives c mid = 0.800 ± 0.024. 

The method here to estimate Nq/N does not depend 
on the specific form of statistical tests being used, as 
long as the p-values pertaining to the tests are obtained. 
But similar to the BUM method and the the Storey- 
Tibshirani method, the method we are proposing here 
also implicitly assumes that the multiple test statistics 
are independent, or at least the true null statistics are 
independent. In the context of microarray experiments, 
this would require that the null genes' expressions are 
independent of each other. This may be not realistic, 
thus the estimation of the fraction of null genes based on 
these methods will be less accurate. An extreme exam- 
ple is when all the null genes in each biological sample 
behave in a concerted manner, and all the non-null genes 
express in a synchronized way, then the p-values we ob- 
serve will be concentrated on two separate points, one for 
all the null genes and one for the non-null genes. Such a 
situation will defy all the methods for estimating Nq/N 
discussed here. Estimating the fraction of null genes with 
possibly strong inter-gene dependence is an important is- 
sue, and probably a very difficult one, especially without 
specifying their structure of interdependence beforehand. 
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This is beyond the scope of current study, and is an issue 
worth of future investigation and continuous efforts. Un- 
til further statistical advances are made in this respect, 
the method we proposed in this paper can serve as an 
approximation for estimating the fraction of null statis- 
tics. 



VI. DISCUSSION 

The data volume generated by microarray studies com- 
bined with the intrinsic variability of the system demands 
that rigorous statistical analysis be applied to the data to 
avoid the problem of false positives and/or low successful 
rate in DGE detection. In this study we have taken into 
account all the major variables associated with microar- 
ray data. The procedure proposed in this paper deals 
with fluorescent label bias often present in microarray ex- 
periments. A t statistic has been derived for hypothesis 
testing based on a model that describes each gene indi- 
vidually with its own set of parameters. An advantage 
of this design is that if there exists any fluorescent biases 
(D g 7^ D r ) for some genes they will be corrected by the 
reverse labelling procedure. For genes with no fluores- 
cent bias (for example, some genes may have D g = D r ) 
the method will perform equally satisfactorily. 

In this work, we have adopted the normality assump- 
tion, which leads to the test statistic to following the 
Students' t distribution under the null hypothesis. Thus 
the successful detection rate S can be calculated in closed 
form. While the normality assumption seems reasonable 
with common technologies, especially for the measure- 
ment error e V i SCl large scale replicate experiments have 
not yet been performed to make a precise assessment [lj . 
If normality is not met, R defined in Eq.© will continue 
to be an unbiased estimator of the quantity of interest 
but to will not follow Students' t distribution. In this case 
some non-parametric methods [52, 0, could be 
employed. While those methods can be readily applied 
to microarrays with a common reference design, where 
the systematic dye bias subtracts out in the calculation 
of the test statistic, the application of those methods to 
the direct comparison design needs to be further devel- 
oped and investigated. If non-parametric methods have 
to be used the rate of successful detection cannot be as 
readily calculated as in Ea. (|T^|) . 

In the published literature it is a common practice to 
apply some form of normalization (global or local) to re- 
move systematic biases before the statistical analysis of 
microarray data. Here we are proposing to remove much 
of the systematic bias by experimental means, i.e. by 
a dye-swapping procedure. Since the model deals with 
the fluorescent bias for each gene individually, no other 
local normalization procedure (e.g. LOWESS 7]) should 



be applied before the statistical testing procedure given 
here. However, some form of global normalization is ap- 
propriate, such as that utilized by Pollack et al (2(|, or 
that described in [||, where the log-ratios in a microar- 
ray dataset are globally shifted so that the most probable 
value of log-ratio becomes 0. The purpose of global nor- 
malization is to adjust the effect of global factors that 
could generally affect the fluorescence, such as a differ- 
ence between the overall concentrations of two mRNAs, 
and possibly the difference of photo- amplifier voltages 
used between the two fluorescent channels when the mi- 
croarray image was scanned. All the local feature-specific 
bias is looked after by the reverse labelling and statistical 
testing procedure proposed here. 

Finally a word for the overworked bench researcher 
facing the prospect of multiple hybridizations in order to 
achieve a reasonably high level of S without having to 
contend with an unsatisfactory false positive rate. What 
can be regarded as reasonable? This depends on the de- 
sired outcome of the experiment. If for example the inter- 
est is in defining genes which might give rise to differential 
susceptibility, then there will be a desire to have a high 
value of S in order not to miss any potential candidate 
genes. There would be two ways of achieving this, cither 
by increasing the number of hybridizations or by accept- 
ing a higher false positive rate. In an experiment such as 
the one described then the candidate genes will probably 
be verified by other methods downstream. Therefore the 
balance is driven by the need to achieve a high S and 
the decision is between whether it is more economical to 
use more microarrays, or put more resource into down- 
stream verification. Where no downstream verification 
of DGEs identified in a microarray experiment are pro- 
posed then it is essential to maintain a low value of false 
positive rate, at the expense of S if the total number 
of microarrays is limiting. This study does not seek to 
put a figure on the number of microarrays that should 
be hybridized in an experiment. Rather a framework is 
provided for the experiment designer to decide on the 
number of microarrays to hybridize taking into account 
the system, availability of sample, downstream analysis 
primarily and the objective of the experiment. 
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TABLE I: The fraction of null genes as estimated by Eg. 11411 (c m id) and by the Storey-Tibshirani method (710). Parameters 
used are: = 1, <7t = 0.5,n/ = 2, n r — 2, Pi — 0.4, P u — 0.95. Results are based on 16 simulations for each cell in the table, 
cv, the coefficient of variation, is defined as the standard deviation divided by the true value of null fraction, No/N. 



N /N 


N = 100 
mean stdev cv 


N = 500 
mean stdev cv 


N = 1000 
mean stdev cv 


N = 5000 
mean stdev cv 


0.2 Cmid 
0.2 7T0 


0.186 0.040 0.200 
0.158 0.109 0.544 


0.205 0.017 0.085 
0.209 0.061 0.307 


0.197 0.013 0.067 
0.178 0.042 0.212 


0.201 0.009 0.044 
0.203 0.017 0.087 


0.8 c mi d 

0.8 7T0 


0.767 0.112 0.140 
0.724 0.284 0.355 


0.807 0.047 0.059 
0.785 0.097 0.121 


0.805 0.031 0.038 
0.792 0.064 0.080 


0.800 0.023 0.029 
0.807 0.060 0.075 



